Beyond Basic R

2016-05-05

Mansun Kuo



Beyond Basic R

2016-05-05

Mansun Kuo

What is in this lecture

Some additional knowledge and handy packages for using R as your data manipulation and web scraping tool

Outline



Package

libPaths

Libraries in R are loaded in order:

.libPaths()
## [1] "/usr/local/lib/R/site-library" "/usr/lib/R/site-library"      
## [3] "/usr/lib/R/library"

You can also add an additional library path:

.libPaths(c("your/library/path", .libPaths()))

Change Privilege(Windows)

References



magrittr



Data I/O



Schedualing

Task Schedualing in R

Sometime you may need to execute your R script periodically and automatically:

Steps to Schedualing

  1. Invoking a batch job of R
    • RScript
    • R CMD BATCH
  2. Passing arguments to R
  3. A schedualer:
    • Linux/Mac: Crontab
    • Windows: Task Schedualer

Open shell with RStudio

Environment variable of R and RScript will be exported in this way.

Invoke a batch job

Rscript <PATH OF YOUR R SCRIPT> <ARG1> <ARG2> ...

Shebang

If you use Linux or Mac, you can execute a R script directly with shebang and permission of execution.

#!/usr/bin/env Rscript

Add permission of execution:

chmod u+x schedualing/now.R

Check permission:

ls -l schedualing/now.R
## -rwxrw-rw- 1 mansun mansun 1130 May  5 16:19 schedualing/now.R

Passing arguments

In most situation, I will use a package to dealing with arguments parsing.

now.R

#!/usr/bin/env Rscript

# Use argparser
library(argparser, quietly = TRUE)

# Create an arg.parser object.
p <- arg_parser("Hi! What time it is?")

# Add a positional argument
p <- add_argument(p, arg = "who", help = "Who are you")

# Add an optional argument
# Rscript will raise a warning message if you pass with -g
# but it doenn't matter
p <- add_argument(p, arg = "--greeting", short = "-g", default = "How's going?", type = "character",
                  help = "Greeting word")

# Add a flag, default value is FALSE
p <- add_argument(p, arg = "--chat", short = "-c", flag = TRUE,
                  help = "Whether or not to have a greeting")

# Parse commandArgs(trailingOnly = TRUE) into args
args <- parse_args(p)

print(args)
str(args)

# The original arguments
command_args <- commandArgs(trailingOnly = TRUE)
cat("args:")
print(command_args)

# Get system time
now <- as.character(Sys.time())

# Construct greeting string
greeting <- sprintf("Hi %s! It is %s.", args$who, now)

if (args$chat) {
    greeting <-  paste(greeting, args$greeting)
}

print(greeting)

# Write to a text file
writeChar(greeting, "now.txt")

Exercise

Let’s try to pass some arguments to schedualing/now.R:

Get help:

schedualing/now.R -h

Pass a positional argument:

schedualing/now.R Mansun

Add a flag -c:

schedualing/now.R Mansun -c

Add a optional argument:

schedualing/now.R Mansun -c -g "How are you?"

Crontab(Linux/Mac)

Maintain crontab files to execute scheduled commands in Unix-like OS for individual users.

crontab [-u user] file  
crontab [ -u user ] [ -i ] { -e | -l | -r }  
    -e      (edit user's crontab)  
    -l      (list user's crontab)  
    -r      (delete user's crontab)  
    -i      (prompt before deleting user's crontab)  

You may need to use some command line editor like vim when using crontab -e to edit your crontab. Type select-editor in terminal to choose your favorite editor

Your first crontab

# Execute every minutes 
* * * * * cd /home/mansun/github/BeyondBasicR/schedualing; ./now.R Mansun
crontab schedualing/crontab.txt
crontab -l
## # Execute every minutes 
## * * * * * cd /home/mansun/github/BeyondBasicR/schedualing; ./now.R Mansun

Configure your cron job

┌───────────── min (0 - 59) 
│ ┌────────────── hour (0 - 23)
│ │ ┌─────────────── day of month (1 - 31)
│ │ │ ┌──────────────── month (1 - 12)
│ │ │ │ ┌───────────────── day of week (0 - 6) (0 to 6 are Sunday to Saturday)
│ │ │ │ │
* * * * *  command to execute
# Execute at 00:00 and 12:00 everyday
0 0,12 * * *  command to execute
# Execute at 06:00 every Monday to Friday
0 6 * * 1-5  command to execute

Please refer to Cron For further information.

Task Schedualer(Windows)

Launch task scheduler on Windows:

  1. Press Windows Logo+R to run dialog box
  2. Enter control schedtasks

Launch task schedualer within RStudio:

# Execute a system conmmand to launch task schedualer
system("control schedtasks")

Set trigger

Set job

Exercise

Set a schedualing job on your environment

References